Identify Lines Using HITRAN
The key tasks in this tutorial is to:¶
3.1 | Load the baseline corrected spectrum
3.2 | Load the HITRAN linelist for that molecule
3.3 | Find the peaks automatically using scipy peakfiner
3.4 | Match the HITRAN lines to the peaks
3.5 | Report, plot, and save the results
[1]:
# Import necessary modules
from Xpectra.SpecFitAnalyzer import SpecFitAnalyzer
from Xpectra.LineAssigner import *
from Xpectra.SpecStatVisualizer import plot_fitted_als_bokeh, plot_spectra_errorbar_bokeh
3.1 Load the original and baseline-corrected spectra
so here we want to point out to the baseline corrected spectra from notebook 2
\(\rightarrow\) In step 2, we corrected the spectral baseline and saved it as a CSV file in the processed_data directory. Here we load that data by converting to a DataFrame:
[2]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")
# Import baseline corrected spectrum
corrected_spectrum = pd.read_csv(os.path.join(__reference_data_path__,'processed_data','arpls_baseline_corrected_methane_spectrum.csv'))
# Assign wavenumber (x) and signal (y) arrays
x = corrected_spectrum['original_x'].dropna().to_numpy()
y = corrected_spectrum['original_y'].dropna().to_numpy()
x_baseline_corr = corrected_spectrum['baseline_corrected_x'].dropna().to_numpy()
y_baseline_corr = corrected_spectrum['baseline_corrected_y'].dropna().to_numpy()
Visualize both of them togather
\(\rightarrow\) Visualize the imported spectra:
[3]:
# Obtain previously fitted baseline by reverse correcting the spectrum
spectral_baseline = y - y_baseline_corr
plot_fitted_als_bokeh(wavenumber_values = x,
signal_values = y,
fitted_baseline = spectral_baseline,
baseline_type = 'arpls'
)
3.2 Load HITRAN linelist and parse them
\(\rightarrow\) The next step is to upload the HITRAN line list to a DataFrame. For this, we use the LineAssigner module, instantiating it with the baseline-corrected spectrum and HITRAN file path.
[4]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")
# Define path to HITRAN data
input_file = os.path.join(__reference_data_path__, 'datasets','CH4_nu3.par')
# Initialize LineAssigner
assign = LineAssigner(wavenumber_values = x_baseline_corr,
signal_values = y_baseline_corr,
hitran_file = input_file,
absorber_name= 'CH4')
\(\rightarrow\) With the class initialized, we now parse the line list to a DataFrame. The default columns converted to the DataFrame are: ‘local_iso_id’, ‘nu’, ‘sw’, ‘gamma_air’, ‘local_upper_quanta’, and ‘ierr’.
\(\rightarrow\) This function automatically seperates terms from local quanta into J quantum number, N quantum number, and symmetry.
[5]:
# Parse file to DataFrame
assign.parse_file_to_dataframe()
[5]:
| molec_id | local_iso_id | nu | sw | a | gamma_air | gamma_self | elower | n_air | delta_air | ... | iref | line_mixing_flag | gp | gpp | J_low | sym_low | N_low | J_up | sym_up | N_up | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 2 | 2900.000621 | 1.825000e-25 | 0.023890 | 0.0490 | 0.067 | 814.6845 | 0.63 | -0.005800 | ... | 64 | 3 | 3253433.0 | None | 12 | A1 | 1 | 13 | A2 | 9 |
| 1 | 6 | 2 | 2900.005693 | 6.307000e-27 | 0.005030 | 0.0470 | 0.065 | 1096.0334 | 0.62 | -0.005800 | ... | 64 | 3 | 3253433.0 | None | 14 | F2 | 3 | 14 | F1 | 40 |
| 2 | 6 | 2 | 2900.022027 | 3.048000e-27 | 0.022620 | 0.0460 | 0.060 | 1593.6378 | 0.61 | -0.005800 | ... | 64 | 3 | 3253433.0 | None | 17 | F2 | 2 | 17 | F1 | 47 |
| 3 | 6 | 1 | 2900.027223 | 1.891000e-25 | 0.000465 | 0.0480 | 0.067 | 815.1315 | 0.63 | -0.005800 | ... | 34 | 3 | 3245363.0 | None | 12 | F1 | 3 | 13 | F2 | 21 |
| 4 | 6 | 2 | 2900.035027 | 1.905000e-25 | 0.067460 | 0.0400 | 0.067 | 815.0317 | 0.63 | -0.005800 | ... | 64 | 3 | 3253433.0 | None | 12 | E | 2 | 12 | E | 25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 41623 | 6 | 2 | 3299.877822 | 1.652000e-29 | 0.000353 | 0.0450 | 0.059 | 1780.0695 | 0.60 | -0.006500 | ... | 54 | 3 | 3253433.0 | None | 18 | F2 | 3 | 19 | F1 | 85 |
| 41624 | 6 | 1 | 3299.900527 | 5.946000e-29 | 0.000004 | 0.0380 | 0.061 | 1416.5543 | 0.61 | -0.006500 | ... | 34 | 3 | 3240363.0 | None | 16 | E | 1 | 17 | E | 52 |
| 41625 | 6 | 3 | 3299.901848 | 7.204000e-29 | 0.000221 | 0.0589 | 0.077 | 532.9581 | 0.75 | -0.006346 | ... | 44 | 4 | 2243323.0 | None | 11 | E | 4 | 11 | E | 2 |
| 41626 | 6 | 1 | 3299.984795 | 2.838000e-25 | 0.035670 | 0.0470 | 0.099 | 1526.2146 | 0.75 | -0.006600 | ... | 32 | 3 | 3333232.0 | None | 6 | A2 | 1 | 6 | A1 | 31 |
| 41627 | 6 | 2 | 3299.989099 | 5.343000e-29 | 0.000730 | 0.0380 | 0.060 | 1594.0043 | 0.61 | -0.006500 | ... | 54 | 3 | 3253433.0 | None | 17 | E | 2 | 18 | E | 54 |
41628 rows × 25 columns
\(\rightarrow\) The HITRAN Dataframe is now accessible through class attribute hitran_df
[6]:
# Display header and first 3 rows
assign.hitran_df.head(3)
[6]:
| molec_id | local_iso_id | nu | sw | a | gamma_air | gamma_self | elower | n_air | delta_air | ... | iref | line_mixing_flag | gp | gpp | J_low | sym_low | N_low | J_up | sym_up | N_up | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 2 | 2900.000621 | 1.825000e-25 | 0.02389 | 0.049 | 0.067 | 814.6845 | 0.63 | -0.0058 | ... | 64 | 3 | 3253433.0 | None | 12 | A1 | 1 | 13 | A2 | 9 |
| 1 | 6 | 2 | 2900.005693 | 6.307000e-27 | 0.00503 | 0.047 | 0.065 | 1096.0334 | 0.62 | -0.0058 | ... | 64 | 3 | 3253433.0 | None | 14 | F2 | 3 | 14 | F1 | 40 |
| 2 | 6 | 2 | 2900.022027 | 3.048000e-27 | 0.02262 | 0.046 | 0.060 | 1593.6378 | 0.61 | -0.0058 | ... | 64 | 3 | 3253433.0 | None | 17 | F2 | 2 | 17 | F1 | 47 |
3 rows × 25 columns
3.3 Identify peaks Automaticaly
short description of what package you are using and what you expect to get here
\(\rightarrow\) We move on to identifying the location (in wavenumber) of each peak in our methane spectrum. To accomplish this, we use the LineAssigner instance we created.
3.3.1 Select wavelength range for analysis
\(\rightarrow\) Many times, we are only interested in a certain part of the spectrum, or the entire spectrum has too many peaks to process all at once. We select a range of wavenumbers for our analysis:
[7]:
wavenumber_range = (2911.15, 2911.9) # cm^-1
\(\rightarrow\) Lets visualize the spectrum within this wavenumber range:
[8]:
plot_spectra_errorbar_bokeh(wavenumber_values = x_baseline_corr,
signal_values = y_baseline_corr,
wavenumber_range = wavenumber_range,
absorber_name = 'CH4',
plot_type = 'line')
3.3.2 Find the peaks
\(\rightarrow\) Automatically find the location of spectral peaks, selecting a minimum height threshold for the algorythm, but no maximum.
[9]:
# Find peak centers (wavenumber) and plot over spectrum
peak_centers, peak_heights = assign.line_finder_auto(wavenumber_range = wavenumber_range,
peak_height_min = 0.05,
peak_height_max = 0.6,
__plot__ = True,
__print__ = True)
| peak_centers | peak_heights | |
|---|---|---|
| 0 | 2911.697485 | 0.210332 |
| 1 | 2911.676406 | 0.106295 |
| 2 | 2911.623123 | 0.571206 |
| 3 | 2911.518359 | 0.168570 |
| 4 | 2911.400698 | 0.471979 |
| 5 | 2911.348160 | 0.572617 |
| 6 | 2911.286321 | 0.439682 |
| 7 | 2911.261846 | 0.608428 |
| 8 | 2911.186087 | 0.524801 |
[10]:
assign.peak_centers_auto
[10]:
array([2911.6974854 , 2911.67640581, 2911.623123 , 2911.51835886,
2911.40069811, 2911.3481605 , 2911.28632147, 2911.26184604,
2911.18608727])
3.4 Identify the line
\(\rightarrow\) Compare peaks with known lines
\(\rightarrow\) Find the closest line from HITRAN line list for each peak in the lab spectrum
[11]:
# Filters HITRAN line list
filters = {'local_iso_id' : [1,2]} # Only search for common isotopologue
# Match found lines, plot them over spectrum, and display DataFrame
assign.hitran_line_assigner(threshold = 0.01,
filters = filters,
columns_to_print = ['local_iso_id', 'J_up','nu','peak_center'], # Print over each line
wavenumber_range = wavenumber_range,
__print__ = True, # Display the fitted HITRAN DataFrame
__plot_bokeh__ = True, # Plot interactively with Bokeh
__plot_seaborn__ = False
)
| molec_id | local_iso_id | nu | sw | a | gamma_air | gamma_self | elower | n_air | delta_air | ... | line_mixing_flag | gp | gpp | J_low | sym_low | N_low | J_up | sym_up | N_up | peak_center | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 2 | 2911.697399 | 3.172000e-29 | 0.000225 | 0.0450 | 0.060 | 1594.1021 | 0.61 | -0.005800 | ... | 3 | 3253433.0 | None | 17 | F1 | 4 | 18 | F2 | 27 | 2911.697485 |
| 1 | 6 | 1 | 2911.674563 | 7.653000e-24 | 3.939000 | 0.0390 | 0.062 | 1817.8431 | 0.75 | -0.005823 | ... | 3 | 3333232.0 | None | 9 | F2 | 7 | 8 | F1 | 75 | 2911.676406 |
| 2 | 6 | 1 | 2911.622555 | 5.719000e-23 | 0.037600 | 0.0587 | 0.070 | 575.0555 | 0.67 | -0.008330 | ... | 3 | 4345363.0 | None | 10 | A2 | 1 | 9 | A1 | 11 | 2911.623123 |
| 3 | 6 | 1 | 2911.518480 | 1.271000e-23 | 0.013930 | 0.0583 | 0.070 | 575.0525 | 0.67 | -0.008430 | ... | 3 | 4345363.0 | None | 10 | F2 | 1 | 9 | F1 | 36 | 2911.518359 |
| 4 | 6 | 1 | 2911.401080 | 4.331000e-23 | 0.047480 | 0.0573 | 0.070 | 575.1699 | 0.67 | -0.008890 | ... | 3 | 4345363.0 | None | 10 | F2 | 2 | 9 | F1 | 36 | 2911.400698 |
| 5 | 6 | 2 | 2911.348367 | 5.866000e-23 | 0.602700 | 0.0618 | 0.085 | 104.7777 | 0.75 | -0.002122 | ... | 3 | 3335212.0 | None | 4 | A1 | 1 | 5 | A2 | 6 | 2911.348160 |
| 6 | 6 | 1 | 2911.285780 | 3.903000e-23 | 0.042810 | 0.0576 | 0.070 | 575.2852 | 0.67 | -0.007600 | ... | 3 | 4345363.0 | None | 10 | F2 | 3 | 9 | F1 | 36 | 2911.286321 |
| 7 | 6 | 1 | 2911.261561 | 6.751000e-23 | 0.074010 | 0.0572 | 0.070 | 575.1841 | 0.67 | -0.008480 | ... | 3 | 4345363.0 | None | 10 | F1 | 1 | 9 | F2 | 35 | 2911.261846 |
| 8 | 6 | 1 | 2911.186061 | 5.284000e-23 | 0.057940 | 0.0576 | 0.070 | 575.2596 | 0.67 | -0.007580 | ... | 3 | 4345363.0 | None | 10 | F1 | 2 | 9 | F2 | 35 | 2911.186087 |
9 rows × 26 columns
3.5 Save the results: Plots, dfs
\(\rightarrow\) Use plot saving functionality
[12]:
assign.hitran_line_assigner(threshold = 0.02,
filters = filters,
columns_to_print = ['nu','peak_center'],
wavenumber_range = wavenumber_range,
__save_plot__ = True, # Save the plot (seaborn version)
__reference_data__ = __reference_data_path__)
<Figure size 7000x4200 with 0 Axes>
[13]:
# Add peak_heights
assign.fitted_hitran['peak_heights'] = peak_heights
\(\rightarrow\) Save fitted HITRAN DataFrame to CSV file
[14]:
df = assign.fitted_hitran
# Define file name
file_name = "closest_hitran_lines_auto.csv"
# Save DataFrame to CSV
df.to_csv(os.path.join(__reference_data_path__,'processed_data',file_name), index=False)
[ ]:
[ ]:
[ ]: